Recap the Bayesian regression in the model: for a large . We have showed in here that when .
A different prior is , then
When , (4.2) is (4.1).
Now consider prior for a small , i.e. , and
(4.2) and (4.3) will be proved later.
The posterior mean is then (Note this is closely related to (2.1)) Note that . When is large, , so (4.4) becomes , which matches 2.1 if .
2 Bayesian Approach for Dealing with Unknown and
Assume , . is same as above. Then prior joint density is
(Indicator has also been ignored because is large). The likelihood is then the posterior is
Convert the power to quadratic: where Plug into posterior:
The dependence on is simple through the quadratic which implies This proves (4.2) and (4.3). Also
3 Comments on Bayesian Regularization
In practice, tends to prefer neither too small nor too large. Because and is quite flat. (Note that there is a big difference between and )
Maximizing leads to the unregularized LS estimate leading to overfitting. On the other hand, maximizing leads to a fairly small estimate of leading to a smooth trend function. This can be understood by noticing When is large, will be small simply because the normal density with will be flat for large . When is too small, will be significant only for very smooth but these will have poor values for .
Model . But how to estimate ?
Well, given a guess, I do know how "bad" it is.
Denote our footprint lengths as , and heights as .
If are known, we predict heights as , with .